Impact of Normalization on Entropy-Based Weights in Hellwig’s Method: A Case Study on Evaluating Sustainable Development in the Education Area

Determining criteria weights plays a crucial role in multi-criteria decision analyses. Entropy is a significant measure in information science, and several multi-criteria decision-making methods utilize the entropy weight method (EWM). In the literature, two approaches for determining the entropy weight method can be found. One involves normalization before calculating the entropy values, while the second does not. This paper investigates the normalization effect for entropy-based weights and Hellwig’s method. To compare the influence of various normalization methods in both the EWM and Hellwig’s method, a study evaluating the sustainable development of EU countries in the education area in the year 2021 was analyzed. The study used data from Eurostat related to European countries’ realization of the SDG 4 goal. It is observed that vector normalization and sum normalization did not change the entropy-based weights. In the case study, the max–min normalization influenced EWM weights. At the same time, these weights had only a very weak impact on the final rankings of countries with respect to achieving the SDG 4 goal, as determined by Hellwig’s method. The results are compared with the outcome obtained by Hellwig’s method with equal weights. The simulation study was conducted by modifying Eurostat data to investigate how the different normalization relationships discovered among the criteria affect entropy-based weights and Hellwig’s method results.


Introduction
Multiple criteria decision-making (MCDM) has evolved as a crucial component of operations research, focusing on developing mathematical tools to facilitate the subjective evaluation of performance criteria by decision-makers [1].MCDM techniques address situations where decisions involve multiple, often conflicting, criteria or objectives.These methods help decision-makers assess and prioritize various alternatives based on different criteria, taking into account the inherent complexity and subjectivity of decision-making processes.Several approaches within MCDM have been investigated, each customized to suit the specific decision contexts and preferences of decision-makers [1][2][3].
The weights assigned by decision makers (DMs) to differentiate the importance of criteria play a pivotal role in the multi-criteria decision-making process.Numerous methods exist for determining these weights for DMs [4][5][6] that can be classified into two categories of approaches [7,8]: subjective and objective.The subjective approach relies on evaluations provided by DMs, whereas the objective approach relies on intrinsic information contained in the dataset describing the criteria performances.Various subjective methods are available, including the analytic hierarchy process (AHP) [9,10], rank-based [11,12], direct rating [13,14], Delphi method [15], and point allocation methods [13,14].On the Entropy 2024, 26, 365 2 of 19 other hand, examples of objective approaches include CRiteria Importance Through Intercriteria Correlation (CRITIC) [16], standard deviation (SD), and the entropy weight method (EWM) [15].Among methods for determining objective weights, the EWM is widely adopted and particularly valuable in decision-making processes, especially when there is a lack of prior knowledge about the relative importance of criteria.In the literature, there are two approaches to determining the EWM: one involves normalizing the decision matrix before calculating the entropy value for each criterion, and the other calculates the entropy value directly from the decision matrix [17].The first approach commonly employs the max-min normalization method.
Let us note that normalization stands as a crucial stage in most MCDM methods, aligning all criteria onto a uniform scale and enabling a comparison among alternatives.Several papers argue the importance of the choice of normalization techniques and their impact on different rankings of alternatives [18][19][20].Although two variants of entropybased weights are frequently used in the research (see Section 2.2), only studies [17,21] analyzed the effects of normalization in the EWM on TOPSIS.Therefore, it is vital to analyze the effects of normalization in the EWM using other multi-criteria techniques.
This paper addresses the impact of normalization on entropy weights and the resulting rank ordering in Hellwig's method.Hellwig's method is a multi-criteria decision-making technique that facilitates ranking alternatives based on their proximity to the ideal solution [22,23].In the classical Hellwig's approach, standardization (S) is used based on mean and standard deviations from the set of observations to handle performances measured by different scales before determining the distances.However, in some studies, S was replaced by max-min normalization or vector normalization.Therefore, it would also be vital to analyze the effects of different normalization methods on Hellwig's approach to rank-ordering alternatives.
We use the problem of evaluating sustainable development in the education of EU countries based on real-world Eurostat data to show the influence of various normalization methods on entropy-based weights and Hellwig-based rankings.Surprisingly, the results in our specific decision-making context show that despite having a potential impact on significant differences in the determination of weights, they may only marginally influence the final rankings.Therefore, the simulation study was conducted to verify if these results may have a more general interpretation when the decision-making context changes.In a series of replications, we modified Eurostat data to investigate and discuss the compatibility between entropy-based weights and Hellwig's method.
The objectives and contributions of this study are as follows: • Compare the performance of two variants of entropy-based weight methods in assessing sustainable development in education.

•
Evaluate the effectiveness of three normalization formulas in Hellwig's method for assessing sustainable development in education.

•
Investigate and compare the combined performance of entropy-based weight methods and normalization within Hellwig's method for assessing sustainable development in education.

•
Conduct the simulation study by modifying Eurostat data to discuss and investigate the sensitivity of the obtained results and provide more general conclusions regarding the influence of normalization on entropy-based weights in Hellwig's approach.
The rest of the paper is structured as follows: Section 2 introduces the concept of the EWM and a short literature review concerning the application of entropy weights in decision-making.Section 3 introduces Hellwig's method.Section 4 presents the results, and Section 5 discusses the findings from the simulation study.Finally, Section 6 presents the conclusions.

The Preliminaries and Literature Review
This section introduces the concept of the entropy-based weight method and presents related work that encompasses the EWM in decision-making problems.

Entropy-Based Weight Method
The concept of entropy, originally developed by Claude Shannon [24] in his seminal paper titled 'A Mathematical Theory of Communication' published in 1948, has become a significant measure widely used in information theory.In information theory, entropy measures the uncertainty or randomness associated with a random variable.It quantifies the average amount of information required to describe the outcomes of a random process.The higher the entropy, the greater the uncertainty.
In decision theory, entropy is often used to assess the uncertainty or information content of different alternatives, particularly in determining the weight of criteria in the MCDM process.The decision matrix contains a certain amount of information.Since each column of this matrix describes a single-criterion performance of alternatives, the EWM may allow for the objective calculation of weights based on differences in amounts of information ensured by each criterion.Thereby, the impact of subjective judgments is minimized [25][26][27][28].For instance, a criterion has less influence when all alternatives share similar values for that specific criterion.Additionally, if all values are the same, it becomes possible to eliminate that attribute from consideration [15].In the literature, two variants of the entropy-based weight method are presented.The first variant of the EWM involves no normalization, while the second one includes normalization, usually max-min, before calculating the entropy value for each criterion [17].
Let us assume that we have m alternatives A 1 , A 2 , . . ., A m and n decision criteria C 1 , C 2 , . . ., C n .The general framework for calculating the EWM in multiple-criteria decisionmaking is outlined as follows: Step 1. Determination of decision matrix.
The decision matrix D has the form: where x ij is the value of the j-th criterion for the i-th alternative i = 1, 2, . . ., m, j = 1, 2, . . ., n.
A normalized decision matrix has the form: where z ij is the normalized value x ij of the j-th criterion for the i-th alternative i = 1, 2, . . ., m, j = 1, 2, . . ., n.
Step 3. Calculation of the information entropy of each criterion.
The information entropy E j for the j-th criterion is calculated by the following equation: where for the EWM without normalization, i. e. , Step 2 is omitted, or for the EWM with normalization in Step 2. (5) In particular, when x ij = 0 (or z ij = 0), then it is assumed that p ij ln p ij = 0 for convenience in calculations.To avoid x ij = 0 or z ij = 0, Zhu et al. [29] proposed the following modified formula: where C is a constant that should at least satisfy Step 4. Calculation of weights.
The weight of the j-th criterion is calculated by the following equation: where E j is an extended and normalized information entropy calculated using Formula (3).
It is easy to check that 0 ≤ w j ≤ 1 (j = 1, . . ., n) and ∑ n j=1 w j = 1, according to the properties of entropy.
The lower the information entropy E j , the higher the weight j.In other words, the higher the entropy value of 1 − E j , the greater the weight assigned to the j-th criterion.Increased entropy values 1 − E j signify heightened uncertainty, resulting in a greater weight assigned to the criterion as it holds more decision-relevant information.Conversely, decreased entropy indicates a more predictable criterion, leading to a lower weight.Hence, entropy offers an objective approach to establishing criterion weights.Tackling uncertainty through entropy enhances the robustness of decision-making, especially in scenarios with incomplete or ambiguous information.
The sum method (SM) and vector normalization (VN) are two frequently used normalization formulas in decision-making methods.The calculation equations of the sum method (SM) and vector normalization (VN) are as follows [15]: We can verify that the SM and VN will not alter the entropy-based weights [17]; therefore, there is no point in using them when calculating the EWM in Step 2. It is easy to verify with: and According to the literature gathered in Section 2.2, the max-min (MM) normalization formula is the most commonly employed in the EWM in Step 2. The calculation equation for the MM method is as follows: Therefore, in further analyses, we will concentrate on two variants of the EWM: one without normalization (EWMn) and the other with MM normalization (EWMM) before calculating the entropy value of each criterion.
Dong et al. [32] investigated the risk assessment of water security during drought periods using entropy-weighted methods.Zhang et al. [28] applied TOPSIS and entropy-based weights to evaluate the competitiveness of tourism destinations.Zhang and Wang [38] employed an entropy-weight approach to assess Chongqing's water resource security between 2000 and 2011.This evaluation aimed to identify the origins of pressure on the water resources system and gauge the effectiveness of current response measures.Wu et al. [39] investigated the sensitivity of entropy-based weights for assessing water quality, employing large stochastic samples in their study.Ding et al. [40] presented a comprehensive evaluation of urban sustainable development in China based on the TOPSIS method with entropy-based weights.Zeng and Huang [41] proposed a synthetic assessment and analysis method incorporating nine risk indices guided by natural disaster risk assessment principles.The AHP method was combined with entropy theory to calculate the weights of indicators that integrated subjective and objective weights.Xu et al. [42] proposed an integrated methodology by incorporating an urban flood inundation model, an improved entropy weight method, and a k-means cluster algorithm to evaluate urban flood risk.The weights were calculated by integrating the entropy weight method and the analytic hierarchy process (AHP) method.Shen and Liao [43] utilized the AHP and the entropy method to develop a risk evaluation model for the food cold chain.Mukhametzyanov [26] conducted a comparative analysis of three objective methods for determining criteria weights in multi-criteria decision-making.The methods examined were entropy, CRITIC, and standard deviation, and various propositions for the aggregation of weights were presented.The common feature of the studies mentioned above is applying the max-min method to the EWM determination in Step 2.
At the same time, in a series of papers, Step 2 has been omitted from the EWM calculation [33,34,44,45].Aras et al. [33] assessed Garanti Bank's corporate sustainability performance by examining economic, social, and environmental factors using TOPSIS with an entropy-based weighting method.Dang and Dang [34] assessed the environmental quality of the Organization for Economic Co-operation and Development (OECD) countries using the VIKOR method.The weights of the criteria were determined through the entropy weight method.Tian [45] incorporated the EWM into TOPSIS to evaluate corporate internal control.Hafezalkotob and Hafezalkotob [44] proposed the MULTIMOORA technique, a form of the comprehensive multi-objective optimization based on the ratio analysis (MORRA) technique, with incorporated entropy-based weights for the analysis of the material selection process.
He et al. [25] introduced a method for determining weights and aggregating models in multi-group decision-making.They utilized the entropy weighting technique and the principle of minimum cross-entropy in their approach.Yue [31] applied entropy-based weights in group decision-making with hybrid preference representations.The paper proposes a comprehensive group decision model that combines crisp values with interval data utilizing entropy-based weights.
The general framework of Hellwig's method is as follows: We have m objects (alternatives) A 1 , A 2 , . . ., A m and n variables (criteria) C 1 , C 2 , . . ., C n .In the first step, the data matrix is established: where x ij is the value of the j-th variable (criterion) for the i-th object (alternative) i = 1, . . ., m, j = 1, . . ., n.
Next, the vector of weights is determined: where w j > 0 (j = 1, . . ., n) is the weight of the variable (criterion) C j and ∑ n j=1 w j = 1.It is worth noticing that in the original Hellwig's framework, equal weights are assumed.
Also, the variables (criteria) are categorized as stimulant (positive) corresponding to benefit criteria, and destimulant corresponding to cost criteria.
The ideal solution I is built using the following equation: where: x ij for destimulant (16) for j = 1, . . ., n.
In the next step, the normalized matrix Z is determined: where z ij is a normalized value of x ij (i = 1, . . ., m, j = 1, . . ., n).
In the following step of Hellwig's algorithm, the weighted normalized matrix, denoted as D, is defined as: where: Next, the distances of the i-th alternative A i from the ideal I are calculated using the following formula: where x ij , x + j are weighted normalized values x ij and x + j , respectively.The Hellwig's measure H i is determined as follows: where Finally, a ranking of objects (alternatives) is provided based on the descending values of H i .The higher the Hellwig's value, the higher the ranking position for the respective object (alternative).

A Case Study: Evaluation of Sustainable Development in the Education Area by Hellwig's Framework 4.1. The Source of Data
The 2030 Agenda for Sustainable Development, adopted by all United Nations Member States in 2015, presents a collective roadmap for fostering peace and prosperity for both people and the planet, spanning the present and the future.At its core are the 17 Sustainable Development Goals (SDGs), which serve as a pressing call to action for all nations, irrespective of their development status, to engage in a global partnership [71].Education is pivotal for economic growth and job creation, as it improves employability, productivity, innovation, and competitiveness.In a broader context, education is also a prerequisite for achieving many other Sustainable Development Goals (SDGs) [72,73].Monitoring progress on SDG 4, referred to as 'Quality Education,' in the EU context, focuses on primary education, higher education, adult learning, and digital skills [63,64,74].
This study aims to assess and compare the implementation of SDG 4 across European Union member states using Hellwig's method with various entropy-based weights.We employed data from Eurostat for 2021, focusing on Sustainable Development indicators related to education (SDG 4) [75] for this year.Education, as a complex phenomenon, was characterized using five criteria [75]: The set of indicators for SDG 4 encompasses key aspects intended to monitor progress across diverse educational levels and domains.Table 1 illustrates five indicators measuring the assessment of SDG 4 in EU countries in the year 2021.

Results
To observe the impact of normalization on Hellwig's results, we designed a comparative study in which we considered a combination of normalization mode for the EWM and normalization formula for Hellwig's method: combination mode I (CMI): none in the EWM and S in Hellwig's method; combination mode II (CMII): MM in the EWM and S in Hellwig's method; combination mode III (CMIII): none in the EWM and MM in Hellwig's method; combination mode IV (CMIV): MM in the EWM and MM in Hellwig's method; combination mode V (CMV): none in the EWM and VN in Hellwig's method; combination mode VI (CMVI): MM in the EWM and VN in Hellwig's method.
The calculation of the entropy-based weights without normalization (EWMn) and with MM normalization (EWMMM) in Step 2 is presented in Table 2.The Garuti's G compatibility index [76] is employed for comparing weight systems, and its calculation is as follows [76]: The index G = 1 indicates full compatibility of two systems of weights, while G = 0 signifies total incompatibility.The Garuti index value in our study, G EWMn EWMMM = 0.573, confirms the weak compatibility of the systems of weights.Let us note that the weight coefficients correspond to their coefficients of variation (see Table 1).The most important criterion is C 4 (64.73%),followed by C 1 (40.84%),C 2 (21.72%), and C 5 (21.10%) in that order.The least important criterion is C 3 , corresponding to a variability of 8.40%.The comparison of two systems of weights is presented in Figure 1.We can observe that the max-min method in the EWMMM resulted in a flattening of the differences between criteria values.The most important criteria (C1, C4) obtained lower weights, while the least important ones (C2, C3, C5) received higher weights compared to the EWMn.signifies total incompatibility.The Garuti index value in our study,  EWMMM EWMn confirms the weak compatibility of the systems of weights.Let us note that the coefficients correspond to their coefficients of variation (see Table 1).The most im criterion is  4 (64.73%),followed by  1 (40.84%), 2 (21.72%), and  5 (21.10% order.The least important criterion is  3 , corresponding to a variability of 8.40% The comparison of two systems of weights is presented in Figure 1.We can that the max-min method in the EWMMM resulted in a flattening of the differe tween criteria values.The most important criteria (C1, C4) obtained lower weigh the least important ones (C2, C3, C5) received higher weights compared to the EW The ideal solution  (Formulas ( 15) and ( 16)) has the [2.40, 62.60, 100.00, 34.70, 79.18].The criteria values are normalized using the sta zation (Formula ( 18)), max-min (Formula (12)), and vector normalization (Form methods.Next, the values of distance measures between the alternatives (countri are calculated (Formula ( 21)).Finally, Hellwig's measures with six combination are determined (Formula ( 22)).The Hellwig's measure values and the rankings tries obtained by combination modes (two variants of entropy-based weights a normalization formulas in Hellwig's method) are presented in Table 3.The ideal solution I (Formulas ( 15) and ( 16)) has the form [2.40, 62.60, 100.00, 34.70, 79.18]The criteria values are normalized using the standardization (Formula ( 18)), max-min (Formula (12)), and vector normalization (Formula ( 9)) methods.Next, the values of distance measures between the alternatives (countries) and I are calculated (Formula ( 21)).Finally, Hellwig's measures with six combination modes are determined (Formula ( 22)).The Hellwig's measure values and the rankings of countries obtained by combination modes (two variants of entropy-based weights and three normalization formulas in Hellwig's method) are presented in Table 3.
While analyzing the positions of the EU countries in the overall rankings obtained using six combination modes of Hellwig's method, one may observe that the rankings are very similar, as confirmed by the Kendall tau coefficients (Table 4).What is interesting is that the differences between the rankings are one or two positions.
Moreover, the disparities in Hellwig's values are minimal, as evidenced by Pearson's coefficients (Table 5).
Basic descriptive statistics for six combination modes of Hellwig's measures are presented in Table 3 and Figure 2.
When examining the box plots representing Hellwig's values, we can observe that the distributions obtained for various combination modes are very similar.The differences in Hellwig's values among the EU countries range from 0.722 to 0.763.The mean falls between 0.387 and 0.402, with a standard deviation of 0.191 to 0.201.At the same time, no matter which combination mode was used, the results indicate similar significant disparities among EU countries in achieving the SG4 goal (the pattern of differences was preserved).No country excelled or lagged in all criteria.Sweden, the Netherlands, and Finland received the top scores across all Hellwig's modes among EU countries in 2021.High Hellwig's scores were also attained by Denmark, Slovenia, Estonia, and Luxembourg.Conversely, Bulgaria, Romania, and Greece recorded the lowest scores in 2021.
It is indeed a surprising finding that using various techniques for data normalization for EWM-based weights and for Hellwig's algorithm does not cause significant changes in the final evaluation, although it significantly affects the weights.This phenomenon requires deeper consideration and analysis, which we will conduct in the following section.It is indeed a surprising finding that using various techniques for data normalization for EWM-based weights and for Hellwig's algorithm does not cause significant changes in the final evaluation, although it significantly affects the weights.This phenomenon requires deeper consideration and analysis, which we will conduct in the following section.

Discussion
Chen [17] investigated the impact of max-min normalization on the EWM and the

Discussion
Chen [17] investigated the impact of max-min normalization on the EWM and the relationships between the EWM and TOPSIS with various normalization approaches in TOPSIS.The studies showed that normalization can influence the decision outcomes of the entropy-based TOPSIS method.Max-min normalization affects the EWM results and fails to represent the raw data's diversity accurately.The examples presented by Chen show that the system of weights differs in values and the order of importance of criteria.Chen [17] also does not recommend MM for the entropy weight method, and VN is advised for TOPSIS method.He claims that the weights become meaningless if MM is employed for the EWM.
The original Hellwig approach used equal weights for criteria.Maggino and Ruviglioni [77] observed that equal weights were commonly employed in many applications.Greco et al. [78] argued in favor of equal weights for various reasons, such as simplicity of implementation, the absence of a theoretical foundation to support a differentiated weighting scheme, disagreement among decision-makers, and insufficient statistical or empirical evidence.Therefore, to analyze more deeply the impact weights on the final ranking, we compared the results presented in Table 3 with the results of Hellwig's method H_S, H_MM, and H_VN for the equal weights and S, MM, and VN normalization formulas, respectively (Table 6).
Similarly, as with entropy-based weights applied in the Hellwig measure (Tables 4 and 5), the rankings and the disparities in Hellwig's values obtained by H_S, H_MM, and H_VN with equal weights are similar (Tables 7 and 8).
Strong Pearson correlations were obtained for different systems of weights and the same data normalization procedure.For S normalization, we obtained the following: P(H_S, CMI) = 0.872, P(H_S, CMII) = 0.936; for MM normalization: P(H_MM, CMIII) = 0.873, P(H_S, CMIV) = 0.994; and for the VN formula: P(H_VN, CMV) = 0.956, P(H_VN, CMVI) = 0.965.The Kendall tau correlation coefficients were not that high but still indicated moderately strong associations, yielding the following results: K(H_S, CMI) = 0.783, K(H_S, CMII) = 0.829; for MM normalization: K(H_MM, CMIII) = 0.772, K(H_MM, CMIV) = 0.812; and for the VN formula: K(H_VN, CMV) = 0.818, K(H_VN, CMVI) = 0.835.Our study noticed that weights obtained from non-normalization and the MM approach in the EWM are not strongly compatible (Garuti index 0.573).However, they preserve the order of importance of the criteria.It is not unexpected that different objective weighting methods lead to different systems of weights.However, one might be surprised that such distinct systems of weights across three different normalization formulas in Entropy 2024, 26, 365 13 of 19 Hellwig's measure result in highly similar rankings (see Tables 1 and 4) and a very strong correlation between Hellwig's values (Tables 1 and 5).
Therefore, we decided to check if this situation is related to the structure of the Eurostat data used for analysis (i.e., whether it is case-specific).The Pearson correlation coefficients between criteria are presented in Table 9.It is clear that in our case, some countries' single-criterion performances are moderately to highly correlated.Thus, we decided to check whether this correlation may be considered a factor affecting similarities in ranking despite dissimilarities in weights.We organized two simulation studies, each with two scenarios, that amounted to experimenting with different modifications of Eurostat data.
In Study 1, in each replication (repeated 1000 times), we simulated a data structure similar to data from Table 1, i.e., consisting of 27 alternatives and five evaluation criteria.We sampled the performances of alternatives for each criterion, using the normal distribution observed for each criterion in original Eurostat data and their actual means.Additionally, we enforced the correlations among the single-criterion performances as determined for Eurostat data (see Table 6).In the simulation, we only compared Hellwig's results obtained for two different setups, CMI and CMII, i.e., those that differ in using (or not) MM normalization when determining weights, keeping the same standardization-based approach in Hellwig's algorithm.This way, we can observe how the specificity of such non-trivial single-criterion correlations of performances may affect EWM weights and their impact on Hellwig's rank orders of alternatives.
The results of simulation study 1 show that the correlations between Hellwig's indexes for CMI and CMII, as well as the resulting rankings, are high.However, they are not as high as in our real-world case of evaluating the EU countries (see Figure 3A).= 0.812; and for the VN formula: K(H_VN, CMV) = 0.818, K(H_VN, CMVI) = 0.835.
Our study noticed that weights obtained from non-normalization and the MM approach in the EWM are not strongly compatible (Garuti index 0.573).However, they preserve the order of importance of the criteria.It is not unexpected that different objective weighting methods lead to different systems of weights.However, one might be surprised that such distinct systems of weights across three different normalization formulas in Hellwig's measure result in highly similar rankings (see Tables 1 and 4) and a very strong correlation between Hellwig's values (Tables 1 and 5).
Therefore, we decided to check if this situation is related to the structure of the Eurostat data used for analysis (i.e., whether it is case-specific).The Pearson correlation coefficients between criteria are presented in Table 9.It is clear that in our case, some countries' single-criterion performances are moderately to highly correlated.Thus, we decided to check whether this correlation may be considered a factor affecting similarities in ranking despite dissimilarities in weights.We organized two simulation studies, each with two scenarios, that amounted to experimenting with different modifications of Eurostat data.
In Study 1, in each replication (repeated 1000 times), we simulated a data structure similar to data from Table 1, i.e., consisting of 27 alternatives and five evaluation criteria.We sampled the performances of alternatives for each criterion, using the normal distribution observed for each criterion in original Eurostat data and their actual means.Additionally, we enforced the correlations among the single-criterion performances as determined for Eurostat data (see Table 6).In the simulation, we only compared Hellwig's results obtained for two different setups, CMI and CMII, i.e., those that differ in using (or not) MM normalization when determining weights, keeping the same standardizationbased approach in Hellwig's algorithm.This way, we can observe how the specificity of such non-trivial single-criterion correlations of performances may affect EWM weights and their impact on Hellwig's rank orders of alternatives.
The results of simulation study 1 show that the correlations between Hellwig's indexes for CMI and CMII, as well as the resulting rankings, are high.However, they are not as high as in our real-world case of evaluating the EU countries (see Figure 3A).We may see that the average Pearson coefficient between counties' performances measured by Hellwig's index equals 0.863 in simulation.Moreover, even a third quartile (0.93) is smaller than the value we obtained for our case, i.e., 0.987.In fact, the relative frequency of obtaining results that are at least as correlated as in our real-world case is as small as 0.1% (observed in one replication only).The same applies to comparing Kendall Entropy 2024, 26, 365 14 of 19 ordinal correlations, though the differences are even more visible here.The Garuti index that measures the similarity of two systems of weights for our simulation data equals 0.45.It is not very different from what we observed in empirical data (0.54).However, when we look in detail at the relationships among the weights of subsequent criteria obtained in each iteration for the EWMn and EWMMM methods, we will find that in 13 replications only (1.3%), the order of weights was the same as it was for the EWMn and EWMMM determined for empirical data.The same systems of weights have a Kendall tau index equal to 1.In our simulation, the average Kendall tau was 0.194.To be sure the results we obtain are reliable, i.e., adequately resembling the situation of intercorrelated criteria, we determined the adjusted RV Ghaziri index [79] between the correlation matrices of criteria for data matrices sampled for each iteration and the correlation matrix from Table 6.The Ghaziri index was 0.92, which proves extreme similarity.
From the above, one may conclude that case-specificity may be an issue here and impacts the similarity of rankings despite the dissimilarity of weights.It is, however, an important finding that clearly shows that a higher correlation among the performances of alternatives makes the Hellwig results less sensitive to the criteria weights.
In view of the above results, in Study 2, we decided to relax the requirements for the correlation of criteria within decision matrices.Therefore, we sampled 1000 decision matrices, in which we only ensured that the data for each criterion came from the normal distribution and had the means and standard deviations equal to the empirical ones.Then, we determined the same comparative indexes as for the results in Study 1. Their general distributions are shown in Figure 3B.Here, the differences in Hellwig's CMI and CMII results seem more evident.The average Kendall and Pearson correlations between CMI and CMII are 0.510 and 0.656, respectively.These correlations are significantly smaller than those obtained in Study 1 (at p < 0.001 in the Mann-Whitney test).It clearly shows that the rankings and ratings start to differ for the EWMn and EWMMM-based weights if the correlations do not bind the single-criterion performances.The significant differences between the sampled performance matrices and the empirical ones in terms of correlations of criteria are proven by the distribution of the adjusted RV Ghaziri index (with an average value equal to 0).
Interestingly, in Study 2, the sampled data allowed for generating the systems of weights according to the EWMn and EWMMM, which were more similar than those in Study 1.The average Garuti index in Study 2 equals 0.585, and the Garuti's distribution is significantly different in Studies 1 and 2, at p < 0.001 in the Mann-Whitney test.However, the similarity of the results (rankings or ratings) is still weaker in Study 2. It strengthens our earlier observation that the correlation of the criteria may make the results insensitive to the normalization methods used, no matter how similar or different the EWM weights they produce.

Conclusions
This research aligns with the broader context of studies related to the impact of certain factors on the final ranking obtained through multiple-criteria decision-making methods.In this study, we addressed how selecting variants of entropy-based weights and normalization formulas influences the ranking obtained through Hellwig's method.Four primary scientific goals were achieved in this paper.
The first goal was to analyze the impact of two variants of entropy-based weights (with and without max-min normalization) on the outcome obtained by Hellwig's method.It is important to emphasize that the major advantage of entropy-based weight methods is their ability to handle the lack of knowledge about important criteria through simple and uncomplicated calculations using only information provided by the criteria and an intuitive interpretation of the entropy measure.The second goal of the paper was to analyze the impact of three different normalization methods (standardization, max-min, and vector normalization) on the outcome obtained by Hellwig's method.The comparative analysis focused on the most commonly used normalization methods in Hellwig's and other MCDM methods.The third goal was to analyze the impact of the combination of entropy weight methods and normalization formulas (six combination modes) on the outcome obtained by Hellwig's method.The final goal was to compare the performance of different variants of Hellwig's method for analyzing sustainable development in the education sector.We compared the rankings obtained by Hellwig's method using different combination modes.
The study successfully achieved its objectives by comprehensively analyzing the impact of entropy-based weights and different normalization methods on the outcomes derived from Hellwig's method.This thorough examination provided valuable insights into the effectiveness of various approaches in the decision-making process.The research contributes to the existing literature by offering a unique perspective on the combined use of entropy-based weights and normalization techniques within the context of Hellwig's method.To the best of our knowledge, no prior studies have explored this aspect to such a degree, making our work a significant contribution to the field of multi-criteria decision-making.
The study also analyzes the impact of both entropy-based weights and normalization methods in evaluating sustainable development in the education area.In summary, the differences between the systems of weights obtained by two entropy-based methods are significant.The impact of the normalization formula on the final ranking obtained by Hellwig's method while maintaining the weight system is negligible.However, surprisingly, the combination modes of Hellwig's measure and the normalization formula did not significantly affect the results in our real-world problem, i.e., positioning EU countries in the rankings.In each of the obtained rankings, the countries with the highest levels of realization of SDG goals in the education sector were Sweden, Finland, and the Netherlands, while the lowest-ranked countries were Bulgaria, Romania, and Greece.However, we proved that the lack of significant differences in the EU case is related to the specificity of the problem and the more than average correlation of some criteria in the decision matrix.Our simulation studies showed that the result could be more different when we compare data samples with similar distributions and correlations.If the correlation weakens, the normalization techniques significantly affect differences in Hellwig's rankings and ratings.
Despite the valuable insights presented in this study, it allowed us to acknowledge some limitations associated with the research design, particularly in the context of the data sample structure.The real-world data may reveal some interdependencies that make the use of different normalization and MCDM techniques lead to similar results.In view of them, an empirical study's findings may be constrained by the representativeness of the primary data set used (here, the Eurostat records of EU countries' performances).A more extensive and diverse set of samples could provide a broader understanding of the problem and indicate whether the regularities observed are case-specific only.
Therefore, there is a need for further research to delve into the in-depth analysis of the relationship between the number of alternatives, criteria, and data structure, measured by correlations between criteria.The simulation experiments with different levels of correlations among criteria could provide better-grounded conclusions on how Hellwig's method performs depending on the version of the EWM applied in advance to produce the system of weight for MCDM analysis.Additionally, exploring criteria of various descriptive statistics (e.g., mean, standard deviation, coefficient of variation, presence of outliers) and examining the resulting weighting systems obtained through entropy methods and the consistency of rankings obtained by other multiple criteria methods alternative to Hellwig's, such as TOPSIS or VIKOR, could tell us more about which of these techniques is more resistant to the peculiar patterns of the correlations among the evaluation criteria.

Figure 1 .
Figure 1.Comparison of two systems of entropy-based weights.
disparities among EU countries in achieving the SG4 goal (the pattern of differences was preserved).No country excelled or lagged in all criteria.Sweden, the Netherlands, and Finland received the top scores across all Hellwig's modes among EU countries in 2021.High Hellwig's scores were also attained by Denmark, Slovenia, Estonia, and Luxembourg.Conversely, Bulgaria, Romania, and Greece recorded the lowest scores in 2021.

Figure 2 .
Figure 2. Box plots for the six combination modes of Hellwig's measures.

Figure 2 .
Figure 2. Box plots for the six combination modes of Hellwig's measures.

Figure 3 .
Figure 3. Box plots for correlation coefficients between CMI and CMII results in simulation studies with Ghaziri indexes of quality of sampling.

Table 1 .
Indicators measuring the assessment of SDG 4 in EU countries in the year 2021.

Table 2 .
Entropy-based weights obtained using different formulas.

Table 3 .
The values and rank-ordering of EU countries obtained by the combination mode wig's measures.
Figure 1.Comparison of two systems of entropy-based weights.

Table 3 .
The values and rank-ordering of EU countries obtained by the combination modes of Hellwig's measures.
Source: Authors' calculations.R PEER REVIEW 11 of 19

Table 4 .
Kendall tau coefficients between rankings obtained by six combination modes of Hellwig's measures.

Table 5 .
Pearson coefficients between rankings obtained by six combination modes of Hellwig's measures.

Table 6 .
The values and rank-ordering of EU countries obtained by the equal weights of Hellwig's measures.

Table 7 .
Kendall tau coefficients between rankings obtained by Hellwig's measures with equal weights.

Table 8 .
Pearson coefficients between rankings obtained by Hellwig's measures with equal weights.

Table 9 .
Pearson correlation coefficients between criteria.

Table 9 .
Pearson correlation coefficients between criteria.